AITopics | multimedia application

Collaborating Authors

multimedia application

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

A Kullback-Leibler Divergence Based Kernel for SVM Classification in Multimedia Applications

Neural Information Processing SystemsApr-6-2023, 15:57:03 GMT

Over the last years significant efforts have been made to develop kernels that can be applied to sequence data such as DNA, text, speech, video and images. The Fisher Kernel and similar variants have been suggested as good ways to combine an underlying generative model in the feature space and discriminant classifiers such as SVM's. In this paper we sug- gest an alternative procedure to the Fisher kernel for systematically find- ing kernel functions that naturally handle variable length sequence data in multimedia domains. In particular for domains such as speech and images we explore the use of kernel functions that take full advantage of well known probabilistic models such as Gaussian Mixtures and sin- gle full covariance Gaussian models. We derive a kernel distance based on the Kullback-Leibler (KL) divergence between generative models.

kernel, kullback-leibler divergence, multimedia application, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.40)

Add feedback

Evolutionary Design of the Memory Subsystem

Álvarez, Josefa Díaz, Risco-Martín, José L., Colmenar, J. Manuel

arXiv.org Artificial IntelligenceMar-7-2023

This impact is estimated about 50% of the total energy consumption in the chip [1]. This places the memory subsystem as one of the most important sources to improve both performance and energy consumption. Concerns such as thermal issues or high energy consumption can cause a significant performance degradation, as well as irreversible damages to the devices therefore increasing the energy cost. Previous works have shown that saving energy in the memory subsystem can effectively control transistors aging effect and can significantly extend lifetime of the internal structures [2]. Technological changes combined with the development of communications have led to the great expansion of mobile devices such as smartphones, tablets, etc. Mobile devices have evolved rapidly to adapt to the new requirements, giving support to multimedia applications. These devices are supplied with embedded systems, which are mainly battery-powered and usually have less computing resources than desktop systems. Additionally, multimedia applications are usually memory intensive, so they have high performance requirements which implies a high energy consumption. These features increase the pressure on the whole memory subsystem. Processor registers, smaller in size, work at the same speed than the processor and consume less energy compared with other levels of the memory subsystem.

artificial intelligence, evolutionary algorithm, machine learning, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1016/j.asoc.2017.09.047

2303.16074

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > District of Columbia > Washington (0.04)
Europe > Spain > Extremadura (0.04)
(3 more...)

Genre: Research Report (1.00)

Industry: Energy (1.00)

Technology:

Information Technology > Hardware (1.00)
Information Technology > Communications > Mobile (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
(3 more...)

Add feedback

SINGA-Easy: An Easy-to-Use Framework for MultiModal Analysis

Xing, Naili, Yeung, Sai Ho, Cai, Chenghao, Ng, Teck Khim, Wang, Wei, Yang, Kaiyuan, Yang, Nan, Zhang, Meihui, Chen, Gang, Ooi, Beng Chin

arXiv.org Artificial IntelligenceAug-3-2021

Deep learning has achieved great success in a wide spectrum of multimedia applications such as image classification, natural language processing and multimodal data analysis. Recent years have seen the development of many deep learning frameworks that provide a high-level programming interface for users to design models, conduct training and deploy inference. However, it remains challenging to build an efficient end-to-end multimedia application with most existing frameworks. Specifically, in terms of usability, it is demanding for non-experts to implement deep learning models, obtain the right settings for the entire machine learning pipeline, manage models and datasets, and exploit external data sources all together. Further, in terms of adaptability, elastic computation solutions are much needed as the actual serving workload fluctuates constantly, and scaling the hardware resources to handle the fluctuating workload is typically infeasible. To address these challenges, we introduce SINGA-Easy, a new deep learning framework that provides distributed hyper-parameter tuning at the training stage, dynamic computational cost control at the inference stage, and intuitive user interactions with multimedia contents facilitated by model explanation. Our experiments on the training and deployment of multi-modality data analysis applications show that the framework is both usable and adaptable to dynamic inference loads. We implement SINGA-Easy on top of Apache SINGA and demonstrate our system with the entire machine learning life cycle.

accuracy, application, singa-easy, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/1122445.1122456

2108.02572

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > New York > New York County > New York City (0.04)
(19 more...)

Genre: Research Report (0.82)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A Bimodal Learning Approach to Assist Multi-sensory Effects Synchronization

Abreu, Raphael, Santos, Joel dos, Bezerra, Eduardo

arXiv.org Artificial IntelligenceApr-28-2018

In mulsemedia applications, traditional media content (text, image, audio, video, etc.) can be related to media objects that target other human senses (e.g., smell, haptics, taste). Such applications aim at bridging the virtual and real worlds through sensors and actuators. Actuators are responsible for the execution of sensory effects (e.g., wind, heat, light), which produce sensory stimulations on the users. In these applications sensory stimulation must happen in a timely manner regarding the other traditional media content being presented. For example, at the moment in which an explosion is presented in the audiovisual content, it may be adequate to activate actuators that produce heat and light. It is common to use some declarative multimedia authoring language to relate the timestamp in which each media object is to be presented to the execution of some sensory effect. One problem in this setting is that the synchronization of media objects and sensory effects is done manually by the author(s) of the application, a process which is time-consuming and error prone. In this paper, we present a bimodal neural network architecture to assist the synchronization task in mulsemedia applications. Our approach is based on the idea that audio and video signals can be used simultaneously to identify the timestamps in which some sensory effect should be executed. Our learning architecture combines audio and video signals for the prediction of scene components. For evaluation purposes, we construct a dataset based on Google's AudioSet. We provide experiments to validate our bimodal architecture. Our results show that the bimodal approach produces better results when compared to several variants of unimodal architectures.

artificial intelligence, deep learning, machine learning, (20 more...)

arXiv.org Artificial Intelligence

1804.10822

Country:

North America > United States (0.29)
Europe (0.28)
South America > Brazil > Rio de Janeiro (0.15)

Genre: Research Report > New Finding (0.86)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A Topic Model Approach to Multi-Modal Similarity

Troelsgård, Rasmus, Jensen, Bjørn Sand, Hansen, Lars Kai

arXiv.org Machine LearningMay-27-2014

Calculating similarities between objects defined by many heterogeneous data modalities is an important challenge in many multimedia applications. We use a multi-modal topic model as a basis for defining such a similarity between objects. We propose to compare the resulting similarities from different model realizations using the non-parametric Mantel test. The approach is evaluated on a music dataset.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Machine Learning

1405.6886

Country: North America > United States > New York > New York County > New York City (0.16)

Genre: Research Report (0.65)

Industry:

Media > Music (0.70)
Leisure & Entertainment (0.70)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.78)

Add feedback